Manifold Assumption and Defenses Against Adversarial Perturbations

نویسندگان

Xi Wu

Uyeong Jang

Lingjiao Chen

Somesh Jha

چکیده

In the adversarial-perturbation problem of neural networks, an adversary starts with a neural network model F and a point x that F classifies correctly, and applies a small perturbation to x to produce another point x′ that F classifies incorrectly. In this paper, we propose taking into account the inherent confidence information produced by models when studying adversarial perturbations, where a natural measure of “confidence” is ‖F (x)‖∞ (i.e. how confident F is about its prediction?). Motivated by a thought experiment based on the manifold assumption, we propose a “goodness property” of models which states that confident regions of a good model should be well separated. We give formalizations of this property and examine existing robust training objectives in view of them. Interestingly, we find that a recent objective by Madry et al. encourages training a model that satisfies well our formal version of the goodness property, but has a weak control of points that are wrong but with low confidence. However, if Madry et al.’s model is indeed a good solution to their objective, then good and bad points are now distinguishable and we can try to embed uncertain points back to the closest confident region to get (hopefully) correct predictions. We thus propose embedding objectives and algorithms, and perform an empirical study using this method. Our experimental results are encouraging: Madry et al.’s model wrapped with our embedding procedure achieves almost perfect success rate in defending against attacks that the base model fails on, while retaining good generalization behavior.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Connection between Differential Privacy and Adversarial Robustness in Machine Learning

Adversarial examples in machine learning has been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best-effort, heuristic approaches that have all been shown to be vulnerable to sophisticated attacks. More recently, rigorous defenses that provide formal guarantees have emerged, but are hard to scale or generalize. ...

متن کامل

Machine vs Machine: Minimax-Optimal Defense Against Adversarial Examples

Recently, researchers have discovered that the state-of-the-art object classifiers can be fooled easily by small perturbations in the input unnoticeable to human eyes. It is known that an attacker can generate strong adversarial examples if she knows the classifier parameters. Conversely, a defender can robustify the classifier by retraining if she has the adversarial examples. The cat-and-mous...

متن کامل

Spatially Transformed Adversarial Examples

Recent studies show that widely used deep neural networks (DNNs) are vulnerable to carefully crafted adversarial examples. Many advanced algorithms have been proposed to generate adversarial examples by leveraging the Lp distance for penalizing perturbations. Researchers have explored different defense methods to defend against such adversarial attacks. While the effectiveness of Lp distance as...

متن کامل

Attack and Defense of Dynamic Analysis-Based, Adversarial Neural Malware Classification Models

Recently researchers have proposed using deep learning-based systems for malware detection. Unfortunately, all deep learning classification systems are vulnerable to adversarial attacks where miscreants can avoid detection by the classification algorithm with very few perturbations of the input data. Previous work has studied adversarial attacks against static analysisbased malware classifiers ...

متن کامل

Certified Defenses against Adversarial Examples

While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race? In this work, ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1711.08001 شماره

صفحات -

تاریخ انتشار 2017

Manifold Assumption and Defenses Against Adversarial Perturbations

نویسندگان

چکیده

منابع مشابه

On the Connection between Differential Privacy and Adversarial Robustness in Machine Learning

Machine vs Machine: Minimax-Optimal Defense Against Adversarial Examples

Spatially Transformed Adversarial Examples

Attack and Defense of Dynamic Analysis-Based, Adversarial Neural Malware Classification Models

Certified Defenses against Adversarial Examples

عنوان ژورنال:

اشتراک گذاری